convolutional filter
f8e55d98b0c2569bd0aa25b076e6b3f8-Supplemental-Conference.pdf
Motion Compensation We compare our method to the traditional motion-compensated coding378 approach that forms the core of inter-picture coding in well established compression standards such379 as MPEG. Block matching is an essential component of these standards, allowing the compression of380 video content by up to three orders of magnitude with moderate loss of information. For each block381 in a frame, typical coders search for the most similar spatially displaced block in the previous frame382 (typically measured with MSE), and communicate the displacement coordinates to allow prediction383 of frame content by translating blocks of the (already transmitted) previous frame. We implemented384 a "diamond search" algorithm [29] operating on blocks of 8 8 pixels, with a maximal search385 distance of 8 pixels which balances accuracy of motion estimates and speed of estimation (the search386 step is computationally intensive). We use the estimated displacements to perform causal motion387 compensation (cMC), using displacement vectors estimated from the previous two observed frames388 (xt 1 and xt) to predict the next frame (xt+1) rather than the current one (as in MPEG).389
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain
Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, Chao Xu
Deep convolutional neural networks (CNNs) are successfully used in a number of applications. However, their storage and computational requirements have largely prevented their widespread use on mobile devices. Here we present an effective CNN compression approach in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections. By treating convolutional filters as images, we decompose their representations in the frequency domain as common parts (i.e., cluster centers) shared by other similar filters and their individual private parts (i.e., individual residuals). A large number of low-energy frequency coefficients in both parts can be discarded to produce high compression without significantly compromising accuracy. We relax the computational burden of convolution operations in CNNs by linearly combining the convolution responses of discrete cosine transform (DCT) bases. The compression and speed-up ratios of the proposed algorithm are thoroughly analyzed and evaluated on benchmark image datasets to demonstrate its superiority over state-of-the-art methods.
How Many Samples are Needed to Estimate a Convolutional Neural Network?
A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample complexity of estimating CNNs. We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples. Since, in typical settings $m \ll d$, this result demonstrates the advantage of using a CNN. We further consider the sample complexity of estimating a one-hidden-layer CNN with linear activation where both the $m$-dimensional convolutional filter and the $r$-dimensional output weights are unknown. For this model, we show that the sample complexity is $\widetilde{O}\left((m+r)/\epsilon^2\right)$ when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are a localized empirical process analysis and a new lemma characterizing the convolutional structure. We believe that these tools may inspire further developments in understanding CNNs.